Skip to content

security: fix critical RCE, IDOR, SSRF, webhook, and rate-limit issues#2

Merged
madhavcodez merged 1 commit into
mainfrom
cleanup/phase-1-security
May 18, 2026
Merged

security: fix critical RCE, IDOR, SSRF, webhook, and rate-limit issues#2
madhavcodez merged 1 commit into
mainfrom
cleanup/phase-1-security

Conversation

@madhavcodez

Copy link
Copy Markdown
Owner

Summary

PR 2 of 4 in the critical-cleanup sweep. Addresses every CRITICAL and the seven highest-severity HIGH findings from the parallel security audit.

Pairs with the CI gates landed in #1.

CRITICAL fixes

1. Python executor RCE (services/crews/tools/python_executor.py)

The previous blocklist was a substring search on the raw code string. Bypassable in seconds:

m = __import__('o' + 's')              # split string defeats `import os` check
importlib.import_module('subprocess')  # importlib wasn't in the blocklist
open('/etc/passwd').read()             # open was never blocked
().__class__.__bases__[0].__subclasses__()  # dunder traversal to subprocess.Popen

Fix: Replaced with RestrictedPython.compile_restricted — AST-level validation that rejects __import__, exec, eval, dunder traversal, and attribute writes at compile time. Globals are limited to safe_builtins + limited_builtins + utility_builtins plus a name-bound stdlib allowlist (math, statistics, collections, itertools, etc.). Stdout captured by PrintCollector. Thread-based wall-clock timeout.

2. Twilio webhook signature validation (api/voice_webhooks.py)

All three endpoints (voice-status, voice-recording, voice-transcription) had zero signature validation. Anyone with the URL could:

  • Inject arbitrary transcripts for any guessable call_sid
  • Forge call statuses (mark completed, failed)
  • Trigger voice_service.process_completed_call against arbitrary records

Fix: New core/webhook_security.verify_twilio_signature — HMAC-SHA1 over the sorted form params + the public URL, signed with TWILIO_AUTH_TOKEN. Uses TWILIO_WEBHOOK_BASE_URL to reconstruct the URL Twilio actually signed when behind a reverse proxy. All three endpoints Depends() on the verifier, so requests are 403'd before any DB access.

3. Resend webhook HMAC (api/webhooks.py)

RESEND_WEBHOOK_SECRET was configured but never imported by the handler. Anyone could POST forged bounce/complaint events to suppress legitimate emails or fabricate EmailEvent rows.

Fix: Svix-style HMAC-SHA256 over {webhook_id}.{timestamp}.{body} with 5-minute replay tolerance. Handles the whsec_ prefix and both svix-* / webhook-* header casings.

4. init_db() silent failure (database.py)

except Exception:
    pass  # Tables already exist from migrations

Swallowed all exceptions including wrong-DSN/permission/network errors — boot succeeded with a dead DB.

Fix: Replaced with an explicit SELECT 1 connectivity check that raises RuntimeError on failure. Base.metadata.create_all() removed — Alembic is the single source of truth for schema, which closes the silent migration-drift bug.

HIGH fixes

5. SSRF on outbound HTTP (core/url_guard.py, both web_scraper.py)

Both scrapers used httpx.AsyncClient(follow_redirects=True) with no IP filter. Prompt-injectable into hitting http://169.254.169.254/... (AWS IMDS) or any internal service.

Fix: New url_guard module with assert_safe_url + safe_http_get:

  • Rejects non-http/https schemes (file://, gopher://, etc.)
  • Resolves all DNS records; rejects if any is loopback / link-local / RFC-1918 / multicast / reserved
  • safe_http_get follows redirects manually with per-hop revalidation — closes the "benign hostname 302s to IMDS" bypass

6–8. IDOR fixes (api/findings.py, api/voice.py, api/missions.py)

All three returned data filtered only by their resource ID, never by user_id. An authenticated attacker could enumerate other users' findings, voice extractions, call transcripts, and mission runs by guessing UUIDs.

Fix: Each list/get/create now joins through Project (or Mission for runs) on user_id. voice.py factored the ownership check into a shared _owned_voice_extraction helper. findings.py and voice.py also gained limit/offset pagination.

9. Auth brute-force (api/auth.py)

/auth/login and /auth/register had zero rate limiting despite slowapi being installed.

Fix: @limiter.limit("5/minute") on login, @limiter.limit("3/minute") on register.

10. LLM-controlled phone numbers (core/phone_guard.py, voice_caller.py)

The LLM could supply any string to voice_caller's phone_number parameter — including premium-rate (+1-900), international, and emergency lines. Real cost / TCPA exposure.

Fix: New phone_guard.validate_outbound_number:

  • Parses via phonenumbers
  • Region must be in US/CA allowlist (configurable)
  • Number type must be FIXED_LINE / MOBILE / TOLL_FREE / VOIP
  • Explicit denylist: 1-900, 1-976, 1-809, 9xx emergency short codes
  • Failure returns a structured tool-error so the LLM can self-correct

11. Voice server CORS (voice/server.py)

allow_origins=["*"] with a WebSocket / future-credentialed surface = cross-site hijacking from any origin.

Fix: Parsed settings.allowed_origins allowlist, narrowed methods/headers.

Tests

backend/tests/test_security_guards.py — unit coverage for each new primitive:

  • 7 python_executor tests confirming the historic bypasses now fail
  • 6 url_guard tests for loopback, IMDS, RFC-1918, IPv6 loopback, file://, missing host
  • 5 phone_guard tests for 1-900, 911, +44, empty input, valid US mobile
  • 3 webhook_security tests for Twilio reference match, Resend valid signature, Resend replay rejection

New dependencies

  • RestrictedPython>=7.4 — the sandbox itself
  • twilio>=9.0.0 — used elsewhere already; pinned here for the SDK
  • phonenumbers>=8.13.0 — E.164 parsing

What this does NOT fix (deferred to PR 3)

These are reliability/perf, not security:

  • APScheduler in-process multi-fire under --workers >1 → PR 3
  • Missing DB indexes → PR 3
  • Sync db.query in async routes → PR 3
  • asyncio.ensure_future (deprecated) → PR 3

Test plan

  • CI green on this branch
  • test_security_guards.py passes
  • Manual: hit /webhooks/twilio/voice-status with no signature → expect 403
  • Manual: hit /webhooks/resend with a stale svix-timestamp → expect 403
  • Manual: invoke python_executor with __import__('o'+'s') → expect status: error
  • Manual: voice_caller with phone_number: "+19005551234" → expect rejection

References

Addresses the four CRITICAL and seven HIGH security findings from the
parallel security audit. Each fix is scoped and individually testable.

CRITICAL — Python executor sandbox (RCE)
- services/crews/tools/python_executor.py: replace the substring blocklist
  (bypassable by ``__import__('o'+'s')``, ``importlib.import_module``,
  ``open()``) with RestrictedPython AST-level compilation, curated
  safe_builtins + limited_builtins + utility_builtins, name-bound stdlib
  allowlist (math/statistics/etc.), PrintCollector for stdout capture,
  thread-based wall-clock timeout

CRITICAL — Webhook signature verification
- core/webhook_security.py: new module
  - verify_twilio_signature: HMAC-SHA1 over sorted form params + public URL
    against TWILIO_AUTH_TOKEN; uses TWILIO_WEBHOOK_BASE_URL to reconstruct
    the URL Twilio signed when behind a proxy
  - verify_resend_signature: Svix-style HMAC-SHA256 over
    {id}.{timestamp}.{body} with 5-minute replay window; handles
    ``whsec_`` prefix; accepts both ``svix-*`` and ``webhook-*`` headers
- api/voice_webhooks.py: all three Twilio endpoints now depend on
  verify_twilio_signature; pre-403 before any DB access
- api/webhooks.py: reads raw body once, validates Resend signature, then
  parses JSON; previously RESEND_WEBHOOK_SECRET was configured but unused

CRITICAL — Silent DB-failure boot
- database.py: replace ``except Exception: pass`` in init_db() with a
  loud connectivity check (``SELECT 1``); raise RuntimeError on failure.
  Alembic is the authority for schema; create_all() removed entirely to
  prevent migration drift from being papered over at boot

HIGH — SSRF on outbound HTTP from agent tools
- core/url_guard.py: assert_safe_url + safe_http_get
  - rejects file:// / non-http schemes
  - rejects RFC-1918, loopback, link-local (169.254/16 — cloud IMDS),
    multicast, reserved, unspecified
  - resolves all DNS records (defends against TOCTOU and multi-record
    attacks)
  - safe_http_get follows redirects manually with per-hop revalidation;
    closes the "benign hostname redirects to IMDS" bypass
- services/crews/tools/web_scraper.py: uses safe_http_get with
  follow_redirects=False
- services/data_sources/connectors/web_scraper.py: same; UnsafeURLError
  caught alongside httpx errors in the unified except chain

HIGH — IDOR fixes
- api/findings.py: list_findings now joins Project on user_id; create_finding
  verifies project ownership; cursor pagination added (limit/offset)
- api/voice.py: ``_owned_voice_extraction`` helper centralises the
  Project-join ownership check; list/get/calls and create all route through
  it
- api/missions.py: list_mission_runs now verifies Mission.user_id == user.id
  before returning runs

HIGH — Auth brute-force protection
- api/auth.py: ``@limiter.limit("5/minute")`` on login,
  ``@limiter.limit("3/minute")`` on register — previously unlimited

HIGH — LLM-controlled phone numbers
- core/phone_guard.py: validate_outbound_number
  - parses + validates E.164 via phonenumbers
  - rejects regions outside US/CA allowlist
  - rejects PREMIUM_RATE / VOICEMAIL / PAGER / SHARED_COST number types
  - explicit denylist for 1-900/1-976/1-809 prefixes and 9xx emergency
    short codes
- services/crews/tools/voice_caller.py: validates phone_number before any
  Twilio path; failure returns a structured tool-error so the LLM can
  self-correct rather than retry

HIGH — Voice server CORS
- voice/server.py: ``allow_origins=["*"]`` replaced with parsed
  ``settings.allowed_origins`` allowlist; methods/headers narrowed

Tests
- backend/tests/test_security_guards.py: unit tests for each new module
  - Confirms the historic bypasses are now blocked (__import__ split,
    importlib, open, exec, dunder traversal)
  - SSRF guard rejects loopback, IMDS, RFC-1918, IPv6 loopback, file://
  - Phone guard rejects 1-900, 911, +44, empty
  - Twilio signature helper matches reference implementation
  - Resend signature verifies on valid payload; rejects 1-hour-old replay

Dependencies
- RestrictedPython >= 7.4
- twilio >= 9.0.0 (only for shared TwiML helpers; signature done locally)
- phonenumbers >= 8.13.0
@madhavcodez madhavcodez merged commit 4a6b20f into main May 18, 2026
@madhavcodez madhavcodez deleted the cleanup/phase-1-security branch May 18, 2026 04:31
madhavcodez added a commit that referenced this pull request Jun 22, 2026
…ecycle (#3)

Addresses the highest-leverage findings from the parallel performance and
reliability review. Each change is independently revertible.

Lifespan refactor (CRITICAL)
- main.py: replace the warn-and-continue init pattern with a
  SubsystemReadiness tracker. Required subsystems (database) fail-fast and
  abort startup; optional subsystems (expert seeding, source_registry,
  scheduler, redis_bridge) degrade gracefully and expose their state via
  the new /health/ready endpoint
- Replace ``next(get_session())`` with ``with SessionLocal() as db:`` so
  the session is properly cleaned up on every path
- Remove the ``stop_scheduler = lambda: None`` rebind smell; the default
  is defined up-front and overwritten only on successful start
- redis_task tracked by name and awaited on shutdown so cancellation is
  observable

Health surface
- api/health.py: new GET /health/ready exposes per-subsystem readiness
  from the lifespan handler. Operators can detect degraded boots without
  digging through logs

Tool-result caching (CRITICAL — wired existing infra)
- core/tool_cache.py: new fail-open Redis cache keyed on (tool, params)
  with deterministic JSON hashing, lazy connection, per-call TTL, and
  ``cached=True`` flag on hits
- services/crews/tools/exa_search.py: wraps the real fetch in
  tool_cache.cached with 1h TTL. Previously every mission re-paid Exa
  for the same queries
- services/crews/tools/gemini_search.py: same treatment for Gemini
  grounding calls

Celery worker lifecycle (HIGH)
- celery_app.py: ``worker_max_tasks_per_child=10`` recycles each worker
  process after 10 tasks to release accumulated LLM-context heap that
  workers don't free between tasks. A research-heavy worker grew several
  GB RSS over a day without this
- ``task_soft_time_limit=300`` + ``task_time_limit=360`` so a stuck HTTP
  call (Gemini latency spike, Twilio webhook timeout) cannot hang a
  queue indefinitely

Database indexes (HIGH)
- alembic/versions/f4a8d2c1e9b0_add_perf_indexes.py: adds
  - ix_missions_status (backs ?status=running dashboard filter)
  - ix_missions_user_created (backs /missions home-page query)
  - ix_findings_mission_confidence (backs per-mission finding list with
    confidence-ordered display)

Background task lifecycle (HIGH)
- core/background_tasks.py: ``spawn_background_task`` keeps a strong
  reference in a module-level set + done-callback that logs uncaught
  exceptions. ``asyncio.ensure_future`` (deprecated 3.10+) without
  reference retention is GC-collectable; tasks would silently disappear
- api/missions.py, api/projects.py: three ``asyncio.ensure_future``
  call sites migrated to ``spawn_background_task``

Error-envelope sanitisation (HIGH)
- api/data_sources.py:183,266 + api/workflows.py:262: stop leaking raw
  exception strings in 5xx responses. ``logger.exception(...)`` captures
  the full traceback server-side; the user gets a generic message + a
  hint to check logs by correlation id

Pagination on unbounded list endpoint (HIGH)
- api/run_steps.py: ``limit``/``offset`` parameters (default 200, max
  1000) so a long-running mission's run trace doesn't return a multi-MB
  blob

Deferred to a follow-up PR
- Wrapping the broader async-route sync ``db.query`` calls in
  ``run_in_threadpool`` (or migrating to asyncpg) needs touching ~33
  router modules and warrants its own PR with a focused test plan
- Moving APScheduler jobs to Celery Beat (CRITICAL #2 in perf review)
  requires reconciling autopilot/monitor/workflow job loaders — also
  deferred
- ReactFlow lazy-load + ActivityFeed memoization land with the broader
  dashboard polish PR

Co-authored-by: Madhav Chauhan <j_s_chauhan@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants